Enterprise Storage Solutions For Data & Document Management
HSM in a heterogeneous Unix environment
By Randall L. Thorburn, VP
PLATINUM technology, inc.
Advanced Software Concepts Development Laboratory
The term hierarchical storage management (HSM) was coined to describe the function of managing file system space. The fact is most file systems take on the look of a messy storage closet before long. On average, only about 25% of the information on a typical hard disk is utilized in a 90-day time frame. The remaining 75% of information just sits around taking up valuable space--much like the second set of golf clubs, the old sewing machine, and the bowling trophies that accumulate in the closet. An HSM system can transparently move seldom-used files from the hard disk to a storage device which uses media that is cheaper per MB than expensive hard disks (i.e., optical disc and tape) while leaving behind a "stub" file to identify the original file location. To carry our analogy forward, HSM is like pasting up pictures of all the seldom-used stuff in the closet (to show what was once stored there), and moving the stuff out to the garage for long-term storage. (However, HSM systems automatically restore files with much less effort than it would take to haul that stuff back out of the garage.)
Prior to the '90s, HSM was available only for mainframe computer systems. The early '90s saw HSM capabilities arrive on some UNIX platforms. Within the last couple of years, HSM has become available on operating systems using Pentium and x86 technology as well.
The latest trends in HSM technology include: implementation within native file systems, heterogeneous platform support, scalable and flexible storage configurations, integration with backup functionality, and an emphasis on storage management.
Native file system support
Less functional HSM systems require a dedicated file system to be set up for use by the HSM system. In this type of implementation, data may only be automatically migrated from and retrieved to this dedicated file system. Data from any other file system must be manually moved to the HSM file system before it can be automatically managed. Additionally, should the migration system fail, the entire file system becomes inaccessible.
Newer HSM systems provide migration and restoration services for any native file system supported. A code module that provides the appearance that migrated files still reside locally is added to the file system and may be activated or deactivated from the command line. When deactivated, the migrated files have a new and unique name. This approach also provides support for any native file system attributes such as access control lists (ACLs), context dependent files (CDFs), or any other attributes which may be unique to the host file and operating system.
Support for heterogeneous environments
Some HSM products may be available for a variety of different computers and operating systems, but they may still not be compatible. In that case, entirely separate systems must be purchased for each operating system requiring HSM services, and every separate system must be managed separately.
More robust products support heterogeneous environments. Agents that provide automatic migration and restoration services are available for a variety of computers and operating systems, and they can all store their data on a common storage device.
Scalability and flexibility
Simple HSM products not only require a dedicated file system, but some of them further require that their file system must reside on the same computer platform which hosts the storage device used to store the migrated data. Other products subscribe to a server-centric approach to storage management which requires dedicated HSM client machines to be hard-coded to a particular server and storage device. This severely limits the scalability of an HSM storage configuration and results in a system that has a single point of failure.
The use of distributed, three-tier client-server technology, prevalent in some of the latest HSM products, provides highly flexible and infinitely scalable storage configurations. In this model, a client agent (which provides automated migration and retrieval services) resides on the file systems that require HSM services. The server agent (which drives and manages a storage device) resides on the computer hosting the storage device. A third agent, unique to this model--the global location broker (GLB)--resides on any computer in the same subnet as the computer hosting the storage device.
This model allows multiple storage devices to be used for HSM storage services. As the server agents are activated, they register themselves with their local GLB. As any of the client agents require storage resources for data migration or retrieval, they contact a GLB and are provided a listing and location of all active storage devices available. If they need to store data, they send a request for free space available to all server agents. They choose the storage device they will use by parameters including immediate availability and space available for the media type and format they require. If data is being retrieved, the client agents send each storage server agent a query for the data they require. The server agents search their databases and the appropriate server agent responds to the client agent's query. All of this takes place in milliseconds.
Storage configurations using this type of architecture are very flexible. If a computer hosting a server agent and storage device fails, the storage device can be moved to another host computer, the server agent restarted, and as soon as it registers itself with a GLB, all network client agents are provided with the new location as storage services are required. This type of architecture also provides storage configuration scalability. As new storage devices are added to the configuration they are automatically recognized as a new resource whenever the network client agents request storage resource locations from the GLB.
Integration with backup
Network backup is another area in which new trends are developing. Because of the nature of some HSM systems, as backup software copies data to backup media it causes the migrated data to be retrieved from storage. This can
cause major problems because the actual amount of data represented within a file system using HSM can be several times larger than what the storage device can physically hold.
More seasoned products can coexist with off-the-shelf backup utilities. They accommodate the backup process by either shutting off retrieval services while the backup product is running, or identifying the backup process through processes set up at installation.
Now, however, some HSM products provide network backup services. Some of these products provide multiple methods to ensure complete backups while avoiding mass data retrieval. This is possible because network backup functionality is tightly integrated with the HSM system to the point of sharing storage resources between the two processes. In these systems a backup agent that encounters a migrated file can be configured to either back up the representative or "stub" file (the file that provides the appearance of the migrated data residing locally), be directed to the storage device on which the migrated data resides and back it up from the media directly, or both.
Focus on Storage Management
Industry experts and analysts agree that while HSM and backup are fine solutions on their own, a comprehensive storage management approach including both technologies is the preferred method of dealing with data. A storage management solution should combine backup and HSM with media management facilities to keep track of each storage device's physical location and the contents on each storage device so that administrators can quickly see online where data resides. The latest HSM products include media management facilities and integrate with full-featured media management products as well as backup tools.
HSM for open systems is already a viable alternative to other, equipment-intensive storage options such as multidisk solutions. Organizations using HSM systems that include the technologies mentioned here are already realizing a significant return on their storage software and hardware investments.
Platinum's NetArchive Storage Management product provides integrated HSM and backup functionality and is based on a distributed client-server technology. Support for heterogeneous operating systems and native file systems are designed to promote successful operations in large enterprise environments.
A NetArchive HSM agent can use the services of any NetArchive-managed storage device, even if the two are hosted by different computer types. In addition, the software agent retains the native host computer file system functionality for each supported platform.
File features such as multiple file types, extended access control lists and context dependent files can be retained throughout a file's lifecycle.
Bundled with the HSM Agent, the NetArchive Storage Vault Manager keeps a database of the data within its device and notes where it is stored. It also provides media management in the form of medial labeling, overwrite protection, space reuse, load balancing and volume grouping. Off-line media tracking enables media to be removed from a storage device and moved to a secured site out of the storage environment while being managed by the vault manager.
NetArchive's HSM functionality is independent of the Vault Manager, which enables a NetArchive storage configuration to be scaled or changed as needed without requiring any changes or reconfiguration of the network clients or agents.
Randall Thorburn is vice president of the PLATINUM technology, inc. Advanced Software Concepts Development Laboratory (Escondido, CA). He has worked in the storage industry for eight years and has published a number of articles on network storage strategies. He may be reached at 800-991-7528 or thorburn@platinum.com.
|